Graphical Analysis

Pyevolve comes with a Graphical Plotting Tool, based on the Matplotlib plotting library.

To use this graphical plotting tool, you need to use the DBAdapters.DBSQLite adapter and create a database file, where the population of each generation is stored.

We are going to extend the first example with the database and graphical output.


In [ ]:
from pyevolve import G1DList, GSimpleGA
from pyevolve import DBAdapters

In [ ]:
def eval_func(chromosome):
    score = 0.0
    for value in chromosome:
        if value==0:
            score += 1.0
    return score

In [ ]:
genome = G1DList.G1DList(20)
genome.evaluator.set(eval_func)
genome.setParams(rangemin=0, rangemax=10)

The database adapter is defined in the following cell. The database is stored in a file, and the elements need a specific identifier. We will use always the same identifier, but you could change it if you want to save different evolutions in the same database. The parameter resetDB is set for deleting any existing data in the database.


In [ ]:
sqlite_adapter = DBAdapters.DBSQLite(dbname='first_example.db', identify="ex1", resetDB=True)

When you run your GA, all the statistics will be dumped to this database. When you use the graph tool, it will read the statistics from this database file.

Let's evolve the example. Now, instead of evolving step by step, we will set a number of generations for completing the evolution with a single call to ga.evolve.


In [ ]:
ga = GSimpleGA.GSimpleGA(genome)
ga.setDBAdapter(sqlite_adapter)
ga.setGenerations(20)
ga.evolve(freq_stats=5)
print("Generation: %d" % ga.currentGeneration)
best = ga.bestIndividual()
print('\tBest individual: %s' % str(best.genomeList))
print('\tBest score: %.0f' % best.score)

Plotting

Here are described the main graph types. Usually you can choose to plot the raw or fitness score, which are defined as:

  • The raw score represents the score returned by the Evaluation function, this score is not scaled.
  • The fitness score is the scaled raw score, for example, if you use the Linear Scaling (Scaling.LinearScaling()), the fitness score will be the raw score scaled with the Linear Scaling method. The fitness score represents how good is the individual relative to our population.

In [ ]:
%matplotlib inline
from pyevolve_plot import plot_errorbars_raw, plot_errorbars_fitness, \
                          plot_maxmin_raw, plot_maxmin_fitness, \
                          plot_diff_raw, plot_pop_heatmap_raw

Error bars graph (raw scores)

In this graph, you will find the generations on the x-axis and the raw scores on the y-axis. The green vertical bars represents the maximum and the minimum raw scores of the current population at generation indicated in the x-axis. The blue line between them is the average raw score of the population.


In [ ]:
plot_errorbars_raw('first_example.db','ex1')

Error bars graph (fitness scores)

The differente between this graph option and the previous one is that we are using the fitness scores instead of the raw scores.


In [ ]:
plot_errorbars_fitness('first_example.db','ex1')

Max/min/avg/std. dev. graph (raw scores)

In this graph we have the green line showing the maximum raw score at the generation in the x-axis, the red line shows the minimum raw score, and the blue line shows the average raw scores. The green shaded region represents the difference between our max. and min. raw scores. The black line shows the standard deviation of the average raw scores. We also have some annotations like the maximum raw score, maximum std. dev. and the min std. dev.


In [ ]:
plot_maxmin_raw('first_example.db','ex1')

Max/min/avg/std. dev. graph (fitness scores)

This graphs shows the maximum fitness score from the population at the x-axis generation using the green line. The red line shows the minimum fitness score and the blue line shows the average fitness score from the population. The green shaded region between the green and red line shows the difference between the best and worst individual of population.


In [ ]:
plot_maxmin_fitness('first_example.db','ex1')

Min/max difference graph, raw and fitness scores

In this graph, we have two subplots, the first is the difference between the best individual raw score and the worst individual raw score. The second graph shows the difference between the best individual fitness score and the worst individual fitness score. Both subplots show the generation on the x-axis and the score difference in the y-axis.


In [ ]:
plot_diff_raw('first_example.db','ex1')

Heat map of population raw score distribution

The heat map graph is a plot with the population individual plotted as the x-axis and the generation plotted in the y-axis. On the right side we have a legend with the color/score relation. As you can see, on the initial populations, the last individals scores are the worst (represented in this colormap with the dark blue). To create this graph, we use the Gaussian interpolation method.


In [ ]:
plot_pop_heatmap_raw('first_example.db','ex1')

In [ ]: